169 research outputs found

    ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models

    Full text link
    Generative retrieval (Wang et al., 2022; Tay et al., 2022) is a new approach for end-to-end document retrieval that directly generates document identifiers given an input query. Techniques for designing effective, high-quality document IDs remain largely unexplored. We introduce ACID, in which each document's ID is composed of abstractive keyphrases generated by a large language model, rather than an integer ID sequence as done in past work. We compare our method with the current state-of-the-art technique for ID generation, which produces IDs through hierarchical clustering of document embeddings. We also examine simpler methods to generate natural-language document IDs, including the naive approach of using the first k words of each document as its ID or words with high BM25 scores in that document. We show that using ACID improves top-10 and top-20 accuracy by 15.6% and 14.4% (relative) respectively versus the state-of-the-art baseline on the MSMARCO 100k retrieval task, and 4.4% and 4.0% respectively on the Natural Questions 100k retrieval task. Our results demonstrate the effectiveness of human-readable, natural-language IDs in generative retrieval with LMs. The code for reproducing our results and the keyword-augmented datasets will be released on formal publication

    Pebbles versus planetesimals

    Get PDF
    In the core accretion scenario, a massive core forms first and then accretes an envelope. When discussing how this core forms some divergences appear. First scenarios of planet formation predict the accretion of km-sized bodies, called planetesimals, while more recent works suggest growth by accretion of pebbles, which are cm-sized objects. These two accretion models are often discussed separately and we aim here at comparing the outcomes of the two models with identical initial conditions. We use two distinct codes: one computing planetesimal accretion, the other pebble accretion. Using a population synthesis approach, we compare planet simulations and study the impact of the two solid accretion models, focussing on the formation of single planets. We find that the planetesimal model predicts the formation of more giant planets, while the pebble accretion model forms more super-Earth mass planets. This is due to the pebble isolation mass concept, which prevents planets formed by pebble accretion to accrete gas efficiently before reaching Miso. This translates into a population of planets that are not heavy enough to accrete a consequent envelope but that are in a mass range where type I migration is very efficient. We also find higher gas mass fractions for a given core mass for the pebble model compared to the planetesimal one caused by luminosity differences. This also implies planets with lower densities which could be confirmed observationally. Focusing on giant planets, we conclude that the sensitivity of their formation differs: for the pebble accretion model, the time at which the embryos are formed, as well as the period over which solids are accreted strongly impact the results, while for the planetesimal model it depends on the planetesimal size and on the splitting in the amount of solids available to form planetesimals

    How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

    Full text link
    The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.Comment: Findings of EMNLP 202

    TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

    Full text link
    Despite thousands of researchers, engineers, and artists actively working on improving text-to-image generation models, systems often fail to produce images that accurately align with the text inputs. We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image. TIFA is a reference-free metric that allows for fine-grained and interpretable evaluations of generated images. TIFA also has better correlations with human judgments than existing metrics. Based on this approach, we introduce TIFA v1.0, a benchmark consisting of 4K diverse text inputs and 25K questions across 12 categories (object, counting, etc.). We present a comprehensive evaluation of existing text-to-image models using TIFA v1.0 and highlight the limitations and challenges of current models. For instance, we find that current text-to-image models, despite doing well on color and material, still struggle in counting, spatial relations, and composing multiple objects. We hope our benchmark will help carefully measure the research progress in text-to-image synthesis and provide valuable insights for further research

    Multidisciplinary design of a more electric regional aircraft including certification constraints

    Get PDF
    The use of electrified on-board systems is increasingly more required to reduce aircraft complexity, polluting emissions, and its life cycle cost. However, the more and all-electric aircraft configurations are still uncommon in the civil aviation context and their certifiability has yet to be proven in some aircraft segments. The aim of the present paper is to define a multidisciplinary design problem which includes some disciplines pertaining to the certification domain. In particular, the study is focused on the preliminary design of a 19 passengers small regional turboprop aircraft. Different on-board systems architectures with increasing electrification levels are considered. These architectures imply the use of bleedless technologies including electrified ice protection and environmental control systems. The use of electric actuators for secondary surfaces and landing gear are also considered. The aircraft design, which includes aerodynamic, structural, systems and propulsion domains, is then assessed by some certification disciplines. In particular, minimum performance, external noise and safety assessments are included in the workflow giving some insights on the aircraft certifiability. The results show a reduction of 3% of MTOM and 3% of fuel mass depending on the systems architecture selected. From the certification side, the design has proven to be certifiable and the margins with the certification constraint can be controlled to improve the overall design

    GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

    Full text link
    Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation
    • …
    corecore